QMOS - A Robust Visualization Method for Speaker Dependencies with Different Microphones

نویسندگان

  • Andreas Maier
  • Maria Schuster
  • Ulrich Eysholdt
  • Tino Haderlein
  • Tobias Cincarek
  • Stefan Steidl
  • Anton Batliner
  • Stefan Wenhardt
  • Elmar Nöth
چکیده

There are several methods to create visualizations of speech data. All of them, however, lack the ability to remove microphone-dependent distortions. We examined the use of Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA), and the COmprehensive Space Map of Objective Signal (COSMOS) method in this work. To solve the problem of lacking microphone independency of PCA, LDA, and COSMOS, we present two methods to reduce the influence of the recording conditions on the visualization. The first one is a rigid registration of maps created from identical speakers recorded under different conditions, i.e. different microphones and distances. The second method is an extension of the COSMOS method, which performs a non-rigid registration during the mapping procedure. As a measure for the quality of the visualization, we computed the mapping error which occurs during the dimension reduction and the grouping error as the average distance between the representations of the same speaker recorded by different microphones. The best linear method in leave-one-speaker-out evaluation is PCA plus rigid registration with a mapping error of 47% and a grouping error of 18%. The proposed method, however, surpasses this even further with a mapping error of 24% and a grouping error which is close to zero.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Extension to the Sammon Mapping for the Robust Visualization of Speaker Dependencies

We present a novel method for the visualization of speakers which is microphone independent. To solve the problem of lacking microphone independency we present two methods to reduce the influence of the recording conditions on the visualization. The first one is a registration of maps created from identical speakers recorded under different conditions, i.e., different microphones and distances ...

متن کامل

Multi-channel i-vector combination for robust speaker verification in multi-room domestic environments

In this work we address the speaker verification task in domestic environments where multiple rooms are monitored by a set of distributed microphones. In particular, we focus on the mismatch between the training of the total variability feature extraction hyper-parameters, the enrolment stage, which occurs at a fixed position in the home, and the test phase which could happen in any location of...

متن کامل

Effect of head orientation on the speaker localization performance in smart-room environment

Reliable measures of speaker positions are needed for computational perception of human activities taking place in a smart-room environment. In this work, we investigate the effect of talkers head orientation on the accuracy of acoustical source localization techniques and its relation with the talker directivity pattern and room reverberation. Two different representative speaker localization ...

متن کامل

Robust Speaker Localization through Ad (AWEPAT) Estim

Time delay of arrival (TDOA) estimation between signals input to two or more microphones plays an important role in speaker localization. Most methods employ a linear array of two or more microphones and use the generalized cross correlation method or eigenspace analysis (AEDA) methods. TDOA estimation with linear arrays, however, is highly sensitive to estimation errors when the signals arrive...

متن کامل

Combination method of bone-conduction speech and air-conduction speech for speaker recognition

Recently, some new sensors, such as bone-conductive microphones, throat microphones, and non-audible murmur (NAM) microphones, besides conventional condenser microphones have been developed for collecting speech data. Accordingly, some researchers began to study speaker and speech recognition using speech data collected by these new sensors. We focus on bone-conduction speech data collected by ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009